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ABSTRACT 

In this paper we address the problem of modeling relational 
data, which appear in many applications such as social net- 
work analysis, recommender systems and bioinformatics. Pre- 
vious studies either consider latent feature based models but 
disregarding local structure in the network, or focus exclu- 
sively on capturing local structure of objects based on latent 
blockmodels without coupling with latent characteristics of 
objects. To combine the benefits of the previous work, we 
propose a novel model that can simultaneously incorporate 
the effect of latent features and covariates if any, as well as 
the effect of latent structure that may exist in the data. To 
achieve this, we model the relation graph as a function of 
both latent feature factors and latent cluster memberships of 
objects to collectively discover globally predictive intrinsic 
properties of objects and capture latent block structure in 
the network to improve prediction performance. We also 
develop an optimization transfer algorithm based on the 
generalized EM-style strategy to learn the latent factors. 
We prove the efficacy of our proposed model through the 
link prediction task and cluster analysis task, and extensive 
experiments on the synthetic data and several real world 
datasets suggest that our proposed LFBM model outper- 
forms the other state of the art approaches in the evaluated 
tasks. 
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Data Mining; J. 4 [Social and Behavioral Sciences]: So- 
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1. INTRODUCTION 

Nowadays, relational data has become ubiquitous, which 
consists of interrelated objects with multiple relation types, 
such as in online social networks people connect to each 
other by their friendship, or research papers can be con- 
nected by citation or co-authorship. Thus, modeling re- 
lational data has arisen as a fundamental task in many 
applications, which involves to predict the new relations 
among objects and to discover latent structure among the 
networked data. For instance, given the partially observed 
relational data from a social network, one may be interested 
to predict the missing relationship between unobserved pairs 
of individuals, or to identify the groups of users who share 
common interest in a particular product or service. 

However, the complexity of relational data makes statisti- 
cal modeling a great challenging task: First, the correlations 
among objects give rise to various structural patterns, which 
exhibits the property of stochastic equivalence in the rela- 
tional data [5]. This characteristic implies that the objects 
can be divided into clusters where members within a clus- 
ter have similar pattern of relations to other objects, i.e., 
the cluster structure is either dense or sparse, which devi- 
ates the classical clustering assumption, i.e., the strongly 
correlated data always forms dense clusters Take an 
online social network as an example, people in the same 
company can form dense circles due to their professional re- 
lationships, while others can also constitute a sparse group 
where they share the same preference for buying some prod- 
uct in Groupon but may not be linked to each other, which 
is demonstrated in Figure 1. Second, relational data is quite 
sparse, because each graph generated from the relational 
data involves a number of objects with each being connected 
to only a tiny proportion of the whole graph, which calls for 
the statistical modeling capable of learning from rare, noisy 
and largely missing observations. Third, in addition to con- 
taining the structure information the relational data may 
have extra side information specific to the objects. Thus, 
encoding heterogenous information sources in the relational 
data is required for a flexible modeling. Finally, large scale 
relational data in many real world applications ask for the 
statistical modeling efhciently scalable. 

Previous work can be classified into feature based and 
structure based models. The feature based models employ 
latent matrix factorization framework to learn latent factors 
for each object and each relation, and make predictions by 
taking appropriate inner products. Their strength lies in the 
relative ease of their continuous optimization and in their ex- 
cellent predictive performance. The representative model is 



the Multiplicative Latent Factor Model [s], which associates 
with each user a low dimensional latent sender factor and 
latent receiver factor. However, this approach disregards the 
local structure among the relational data due to other latent 
unmeasured factors, and also lacks interpretable representa- 
tions for the latent structure. In contrast, structure based 
models focus exclusively on capturing latent structures in 
the relational data [2] . The discovered latent structures from 
such models can provide insights about the interactions in 
the relational data which are useful in the absence of side 
information. In fact, these latent structures provide a par- 
simonious model to capture the interactions among objects. 
However, since this kind of approaches do not adjust for 
the effects of objects' features, the resulting latent structure 
may contain redundant information. 

In this paper, we propose a statistical model that can in- 
corporate the effects of latent features and covariates if any, 
and also the impacts from any latent structure that may ex- 
ist in the relational data simultaneously. To achieve this, we 
model the objects and relations from the observational data 
as a function of both latent feature factors and latent struc- 
tural factors to collectively discover globally predictive in- 
trinsic properties of objects and capture the latent structure 
in the data. More precisely, we exclusively assign each ob- 
ject to one and only one latent cluster, which partitions the 
relation matrix into a small number of blocks or co-clusters. 
In this case, the estimated latent feature factors of objects 
and the corresponding probabilities of being clustered into 
certain latent blocks are considered to be the representative 
of latent factors that contribute to the interactions in the re- 
lation matrix, the procedure of modeling the relational data 
is shown as in Figure 1. By coupling these latent factors, 
our proposed model can not only provide better predictive 
performance but also discover the interpretable latent block 
structure. 

Contributions: This paper provides a predictive modeling 
approach for the relational data, which can integrate the 
information sources from the features as well as the local 
structures in the relation matrix. Specifically, our work in 
this paper makes the following several contributions: 

1. We present a novel approach to model the relational 
data as a function of latent feature factors and latent 
block structural factors through our proposed Latent 
Factor Block Model (LFBM). 

2. To efficiently learn the LFBM model, we propose an 
optimization transfer algorithm based on the General- 
ized Expectation-Maximization (EM) method, so-called 
Minorization-Maximization (MM) algorithm to infer- 
ence the latent factors and model parameters. 

3. Extensive experiments are conducted on the synthetic 
data and several real world datasets. The proposed 
model is shown to outperform state-of-the-art meth- 
ods in terms of the better predictive performance and 
clearer latent cluster structures. 

The paper is structured as follows. We first briefly in- 
troduce the problem definition in Section 2. The proposed 
framework based on the Latent Factor Block Model and the 
model specification are presented in Section 3. We derive the 
efficient optimization algorithm to learn the model in Sec- 
tion 4. Then we describe experiments on the synthetic data 



as well as several real world datasets, and provide compar- 
isons with state-of-the-art methods in Section 5. Section 6 is 
about the related work. In Section 7 we present conclusions 
and future work. 

2. PROBLEM DEFINITION 

Before introducing our proposed statistical model, we first 
give the notations that are used throughout this paper. Sup- 
pose we have a set of objects {xi,...,Xn}. Observations 
consisting of pairwise measurements are respectively rep- 
resented by the relation matrix S = {Sij G {0, = 
1, A*'}, where 1 denotes there is an observed present rela- 
tion, denotes the absent relation, and ? denotes the miss- 
ing relation. We then use the binary indicator matrix W to 
indicate whether or not the relation is observed. More specif- 
ically, Wij — 1 means that Sij is observed while Wij = 
means 5*;^ is missing. We use Zi £ {1, K}, where K is the 
number of latent clusters in the relational data, to denote 
the cluster assignment of object Xi and we refer to Zi as the 
latent cluster membership of object Xi. We furthermore in- 
troduce Zik = [zi — k] to indicate that Xi is in the fcth cluster 
when Zik = 1 and otherwise. Latent cluster assignments 
matrix Z — {zik : i £ l,...,N,k £ 1,...,K} includes the 
latent cluster memberships of all the objects in the relation 
graph. Given such a relation graph, our goal is to predict 
the missing relations between the unobserved pairs and to 
discover the latent structure among the objects as well. 

3. OUR PROPOSED MODEL 

We consider modeling the relational data based on the 
latent factor models. We first define a Bernoulli-Logistic 
generative process where the interactions among objects are 
generated, and then propose a hybrid model to capture the 
latent features factors and latent structural information for 
each object, which combine the benefits of both feature 
based and structure based models. 

3.1 Bernoulli-Logistic Based Model 

In case of the relational data. We first assume the elements 
of the relation matrix Sij as Bernoulli-distributed variables, 
which are conditionally independent given the latent vari- 
ables Hij through the logistic function <j{H) = ^^j^I-h ■ 
Thus, the likelihood of the observations in the relation ma- 
trix S can be defined as follows: 

P(S|H) = l\\a{Hy:;^{l - a(ff),,)^-s-]^- 

To characterize the latent variable Hij in the framework 
of latent feature models, we consider that there exists la- 
tent feature factor Ui G R'' for object Xi, where d is the 
dimension of latent feature factor, that could be used for 
encoding the observable attributes (e.g. a user's profile) or 
latent semantic topics (e.g. a movie's genre). Furthermore, 
based on the latent cluster membership Zi, we introduce a 
latent block matrix C £ R;^*^^ to explicitly capture the la- 
tent local structure, where Ckk denotes the probability of a 
link existing between objects within the same fcth cluster, 
and Cki denotes the probability of one object in fcth cluster 
linking to the other object within the Zth cluster. Then the 
latent variable Hij can be defined as follows: 

Hij = Ui-vf + ZiCzJ -I- e (2) 




Here, G R'* and G R'' denotes latent sender feature 
vector and latent receiver feature vector of each object 
Then the inner product model of UivJ provides the proba- 
bilities of a relation between the two objects based on their 
latent features. 

Then, Zi £ R^'^ is considered as the latent cluster indicator 
vector for each object, which implies the object Xi associates 
with the fcth cluster. Actually, the form of z^CzJ provides 
a general model to discover different latent cluster structure 
in the relational data, i.e., dense or sparse cluster. More 
specifically, we can use the block matrix C to represent var- 
ious types of the latent cluster structures in the relational 
data. For example, the model can learn only spare clusters 
by fixing diagonal elements of C to be zeros, while it can also 
find the dense clusters by fixing C as the identity matrix, 
which means that the block structure provides a principal 
way to adapt the model to learn specific types of latent clus- 
ter structures, e denotes the sparsity of the relations in the 
network, which can also be considered as a kind of bias term 

E 

More generally, considering there also exists side informa- 
tion or covariates about the objects or the interactions, we 
can easily incorporate them into the proposed model in the 
similar way as in Generalized Linear Models \^^.. Hence, the 
latent interaction matrix can be defined as follows: 

Hij = /3^Xij + UivJ -I- ZiCzJ + e (3) 

where the vector x^j represents the side information about 
the relation between the objects i and j, /3 denotes the re- 
gression coefficients associated with the pre-defined side in- 
formation, which could be assigned a normal prior distribu- 
tion. 

3.2 Model Specification 

We thus far model the observed interactions by combining 
the benefits of latent feature factors of objects with their cor- 

^Note that for directed interaction matrix we use different 
latent feature vectors for the same object as used in MLFM 
model (5]. 

^For computation convenience in our experiments, the bias 
e is absorbed into latent factors U and V by redefinition. 



responding latent cluster assignments as well as latent block 
structure, and the integration of multiple effects makes our 
proposed model better generalization in link prediction and 
more interpretable for network structure, which is referred 
to as Latent Factor BlockModel (LFBM). The LFBM is dif- 
ferent from the classical factorization-style link prediction 
model which disregards the local latent structure, and also 
differs from the traditional clustering based models that are 
not flexible to account for the side information and process 
the missing relations in the data. 

Moreover, the factorization based latent feature selection 
term u^vj can be used to find more accurate latent charac- 
teristics of the objects, such as the preferences of users on 
certain class of products, and to alleviate the data sparsity 
and data missing problem due to the successful generaliza- 
tion performance. The cluster structure based block term 
ZiCzJ can be capable of learning both dense and sparse 
clusters at the same time in the relational data, and pro- 
viding the interpretable latent cluster assignments for each 
object, which makes the link prediction for some unobserved 
pairs feasible when these pairs have been clustered in cer- 
tain block. Hence, the integration of multiple effects makes 
our proposed model better generalization in prediction and 
understanding in local structure. 

To make our proposed model more accurate, we can im- 
pose some prior distributions on the latent factors. For ex- 
ample, the latent cluster indicator vector for each object 
can be generated based on Multinomial distribution. We 
can also put normal priors on the latent feature factors and 
the block matrix as follows: 



p(U) = nAA(u.iO, A^^I); p(V) = l[J^{^r,\0,Ay'I) 

i i 
k,l 



Based on the above descriptions, we can summarize the 



joint distribution of our proposed LFBM model as follows: 

p(s,H,e) = l[p{s.., 1 H,,MH,j I e) 

i,j k,l 

(4) 

where O denotes the normal prior parameters for the latent 
factors in our model. Then we can build our model on the 
observed data, and provide the parameter estimation for 
learning the latent factors in the next section. 

4. MODEL INFERENCE 

Since the inference task is to estimate the latent factors 
and parameters in the model, we need to maximize the pos- 
terior distribution p{U,V,C, Z, l3\S,&) given the observed 
data and the Bernoulli-Logistic style model likelihood. Markov 
Chain Monte Carlo (MCMC) algorithm has been adopted 
in such latent variable models to compute the posterior dis- 
tributions, however, it always costs expensive computation 
and converges in a slow rate. Therefore, in our model we 
employ the Maximum A Posterior (MAP) strategy to learn 
the latent factors and model parameters. Then the model 
inference problem can be formulated as to maximize the log- 
posterior probability as foUowscz 

L(C/, V, C\Z) = Y1 '^^1 i^^^^^^ - log(l + exp(iy»j))) 

^ ^ (5) 

_ ^triUU^) - ^tr{VV^) - ^triCC^) + E 

where i5 is a constant independent of the model parameters. 
Since the optimization objective (|5| is not jointly convex 
in all the model parameters and latent factors, a globally 
optimal solution is nontrivial to obtain, we resort to the 
alternating projection algorithm to learn the latent factors 
by fixing all but one factor in the objective function and 
updating the free factor by the gradient based method. For 
instance, to learn the latent feature factor for one object Ui 
with all others fixed, we get the gradient equation as follows: 

VL(U) = ^^^^ = (m. (5,. - H,.))V - A^^u, (6) 

Then by setting the gradient (|6| equals to zero we can 
obtain the update equation for the latent feature factor Ui. 
However, by Q it is intractable to derive the closed-form 
iterative update rules for these latent factors; even with the 
Newton based method, due to the complexity of logistic-log- 
partition (LLP) function llp{x) — log(l -|-exp(a;)) in the 
computation complexity for Hessian matrix with respect to 
latent factor is cubic in the number of the model parameters. 

Thus, we construct an optimization transfer algorithm 
based on the Generalized Expectation-Maximization (EM) 
method [6] to alleviate the model complexity in optimiza- 
tion. During the optimization procedure, we employ the 
auxiliary function approach commonly used in EM-style al- 
gorithms to form the minorizing function as a concave lower 
bound of the objective function in the E-step, and then max- 
imizing the minorizing function in the M-step alternately, 
which is so-called Minorization-Maximization (MM) algo- 
rithm. 



In the Generalized E-Step, for learning the latent factors 
in the model, we need to derive the minorizing function by 
aid of the auxiliary function for the objective [t] . 

Definition 1. Given the objective function L{yi) in Equa- 
tion |5]) 1^ Q{Q,,Q,) is an auxiliary function for L(Q,) if the 
conditions 

(i) L{Q.) = Q{Q,Q.); (ii) L{Q.)>Q{n,n). 
are satisfied. 

Lemma 1. //Q(f2,n) is an auxiliary function for L{^1) , 
then L(f2) is non-increasing under the following update: 

= argmaxQ(r2,n'*') 

where fi'*' denotes the current estimation of the model pa- 
rameter and £7''+"'^' is the new estimation to maximize Q. 

Proof. L(n(*+'') > g(n(*+i),nW) > Q{Q^'\n'-'^) = 
L(!7(*') □ 

Note that the defined auxiliary function Q{n, Q) is a lower 
bound of L{Q,), which can be considered as the minorizing 
function. For example, to learn the latent feature factor U, 
we consider the objective function L{\J) only with respect 
to U while fixing the other factors in ([sjl, then for the aux- 
iliary function Q(U, U'*') of a particular form, we have the 
following theorem: 

Theorem 1. If Kiuf^) has the form: 

then we have the following auxiliary function Q(ui,u'*'); 
Q(u.,uW) = L(uf)) + (u, - )^VL(u(*)) 
+ i(u,-u«)-;f(uW)(u.^uW) 

is an auxiliary function for L{ui). 

Proof. See Appendix I for a detailed proof. □ 

Thus, we can derive the minorizing function Q(ui,u^'') 
for learning latent feature vector Ui, which is also the lower 
bound of L{ui), then we optimize the latent parameters 
by maximizing Q{ui,uf^) in the next M-step. Note that 
this optimization transfer algorithm is similar to Newton's 
method for maximizing L{ui) by replacing the Hessian at 
each iteration by the derived matrix K{uf ^), which needs 
to be inverted only once, rather than at each iteration. 

Similar to the optimization for latent factor U, we can 
derive the minorizing functions Q(V,V(*'), Q(C,CW) and 
g(/3, /?'*') with the specific matrices A'(vf>), K{C^^'>) and 
-ft'(/3''') respectively. 

To learn the latent cluster assignment Zi for each object, 
since each object in the relational data is exclusively as- 
signed to a single latent cluster, we can find the optimal 



4.1 Generalized E-Step 



^Here n represents a latent factor while the others are fixed. 



value quite efficiently by maximizing the log-posterior prob- 
ability as follows: 

Zi = arg max ( Wjj {SjjHjj — log(l -I- exp{Hij)))) + E 

3 

K 

Htj = /?'^Xij -I- UivJ + ^ {zikCkiZji) 

k,t = l 

where _E is a constant independent of latent cluster assign- 
ment Zi. Moreover, since the latent cluster assignments Zi 
and Zj are exclusively binary-valued, by which the proba- 
bility of a relation between objects i and j could directly 
be mapped to the corresponding block value Cki, we can 
convert the term z^CzJ to the form of C'^Zij, where is 
the kronecker product of Zi and Zj. Then the latent block 
matrix C can be learnt more efficiently as in the generalized 
linear model. 

4.2 Generalized M-Step 

In the M-step, we can optimize the latent factors and 
model parameters by maximizing the obtained minorization 
functions, which is the quadratic functions of one latent fac- 
tor while fixing the others. 

First, we derive the update rule for latent feature vector 
Ui for each object. Based on the Theorem 1, optimizing Ui is 
equivalent to deriving the Newton step for Ui in Q{ui, u'*') 
as follows: 

= „W „ ^ . VQ(u„uW)[V^Q(u.,uW)]-i 
= uf)-,.VL(ur')[7f(uf')]- 

where 77 can be set by using the Armijo's rule [12| . 

Then, we could also derive the update rules for latent fac- 
tors V and /3 similarly by using the corresponding minorizing 
functions (5(vi,v''-') and Q(/3,/3'-*') respectively. 

For the latent factor C rl we could also convert the mi- 
norizing function Q(C, C~) by employing the kronecker 
product of latent cluster assignments of objects, which makes 
the optimization of C easily as follows: 

^(t+i) ^ c^t) _ ^ . vl(c(")[a:(c<*')]-' (9) 

Based on the above updating rules of latent factors, the 
objective function will monotonically increase; and after com- 
bining the Minorization and Maximization step, the learn- 
ing procedure of our proposed model can converge to a local 
maximum. 

4.3 Model Complexity 

Since the alternating iterative updating rules are employed 
to learn the latent factors, in the generalized EM algorithm 
learning U and V requires a computation time of O(Ncf) 
in each iteration, where d is the dimension of the latent fea- 
ture factor and A'^ denotes the number of observations in 
the data. Learning the regression coefficients /3 requires a 
computation time of 0{Nm^) in each iteration, where m 
denotes the dimension of the side-information vector, and 
learning the latent block factor C requires a computation 
time of 0{Nk^) in each iteration. Since each object in the 
model is assigned to a single latent cluster, the computation 
complexity of the cluster assignments requires only 0{Nk) 
per iteration. Thus, assuming a few number of iterations, 

*We first convert C to vector C 



the overall learning algorithm only requires a computation 
time which is linear to the number of observations, which 
provides the possibility for handling large scale relational 
data. 

5. EXPERIMENTS 

In this section, we demonstrate how our proposed model 
performs on both synthetic data and real world relational 
data compared with alternative methods. 

5.1 Experimental Setup 

In the experiments, we compare the following network 
modeling methods in terms of two tasks. 

• NMF model 15 indicates the Nonnegative Matrix Fac- 
torization model, which is used for data clustering by 
learning the latent feature factors in the latent seman- 
tic space. 

• MMSB model [2] indicates Mixed Membership Stochas- 
tic Blockmodel, which considers to use only relation 
matrix to discover the latent cluster assignments of 
each object. 

• MLFM model fsl indicates the Multiplicative Latent 
Factor Model, which learns latent feature factors from 
the relational data by assuming under the Bernoulli 
distribution. 

• GLFM model [S] is the Generalized Latent Factor Model, 
which generalizes the MLFM model to learn more ac- 
curate latent feature factors. 

• LFBM model is our proposed Latent Factor Block Model. 

To examine how well the compared models perform on the 
relational data, we evaluate two related tasks: link predic- 
tion task and cluster analysis task: 

• From the link prediction task, we can check the gener- 
alization and prediction performance of the compared 
models when the relational data is sparse and noisy. 
Without loss of generality, we use only the relation 
matrices with the objects' identifiers without consid- 
ering the side-information, and we generate random 
train/test data splits from the relational data, and 
compute the average AUG (the Area Under the re- 
ceiver operating characteristic Curve) values against 
the ground truth test data. 

• From the cluster analysis task, we can check the differ- 
ence about the ability in discovering the latent cluster 
assignments by our proposed model and other mod- 
els. In this task, we consider the dataset with content 
information as well as the relational structure. We 
use NMI (Normalized Mutual Information) as the met- 
ric to measure the clustering accuracy of the models, 
which is a standard way to measure the cluster quality. 
For the compared models, we set the number of latent 
clusters to the ground-truth number of class labels in 
the data. 

5.2 Experiment on Synthetic Data 

We first use synthetic binary relational data to examine 
our models. We generate the synthetic data matrix with the 





Figure 2: ROC curve with different methods for 
Link Prediction task on synthetic data. The average 
AUC values are also demonstrated. 



Figure 4: The NMI performance for the latent clus- 
ter assignments of each object using different mod- 
els. 



number of 200 objects with noises, representing a network 
with three three clusters as shown in Figure 3(a). Specif- 
ically, in the first two clusters the objects are fully con- 
nected (i.e, the corresponding sub-matrices are dense clus- 
ters), while within the third cluster the objects are not inter- 
connected (i.e, the corresponding sub-matrix is sparse) but 
connected to the objects in the first cluster. 

To check how well the proposed models work on fitting to 
the relational data, we conduct the task of reconstructing the 
original data by our proposed models. Figure 3 demonstrate 
the reconstructions by fitting different models to the data. 
From the results, we can find that MMSB, GLFM and our 
proposed LFBM model can reveal clearer structures than 
NMF and GLFM models, which indicates the limitations 
of factorization-style only based latent factor models. For 
fair comparison, the hyper-parameters are selected from a 
wide range and the best selection are reported. We set the 
number of latent clusters to fe = 3, and set the dimension of 
latent feature factors to d = 2 for all the models, and the rj 
is set to 0.2, Au, Av and Ac are set to 1. Note that how to 
choose the optimal number of the latent clusters is beyond 
the scope of this paper, and will the future work. 

Then we check the performances on link prediction task 
by using the compared models. We randomly choose 90% of 
the relation data as the training data by setting the weight 
matrix W, and leave the other as missing data for test. For 
the experiments we evaluate the models by repeating the 
process five times and report the average results. Figure 2 
shows the ROC curve and the average AUC performances 
of different models for the link prediction task. It can be 
observed that the best performing method among all the 
models is our proposed LFBM model, which indicates that 
it can efficiently provide better generalization and predic- 
tive performance compared to the MMSB model based only 
cluster structure, and also beat the other factorization based 
models (e.g., NMF and MLFM) due of its flexibility to dis- 
cover special clusters and integrate the benefits from latent 
block structure. 

We also conduct the experiments to check the latent clus- 
ter assignments learned by each model. We use the NMI 
between the resulting cluster labels and the ground truth la- 
bels to measure the cluster quality. For our proposed model 
LFBM and MMSB model, the resulting cluster labels can be 
obtained directly from the latent cluster assignment factors 
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Figure 5: The average AUC values for MMSB and 
LFBM models when varying the number k of latent 
clusters on LiveJournal dataset with d = 20. 

in the models, while for NMF, MLFM and GLFM mod- 
els, we use latent feature factors U to determine the cluster 
label of each data point, and assign object i to latent clus- 
ter A: if = argmaxtjUid. Then the average performance 
scores are reported in Figure 4. We observe that our pro- 
posed model has the NMI score comparable with the MMSB 
model, which consider only the latent structure in the data, 
rather better than the other models based on latent feature 
factors, which proves the fiexibility and generality of LFBM 
model in analyzing the relational data. 

5.3 Experiment on Real World Datasets 

We compare the performance of the proposed models on 
real world datasets for different tasks. We also report re- 
sults for a varying number of latent features d and a varying 
number of latent clusters k. In general increasing the num- 
ber of parameters improves the performance, however, there 
is a compromise between complexity and performance in the 
models. 

5.3.1 Case Study 1: Link Prediction in Social Net- 
works 

We use two social network datasets for the link predic- 
tion experiments. The first dataset is a online social net- 
work data from the LiveJoural website. The LiveJournal 
dataset fsl contains binary social friendship between users 
from the website, consisting of 3, 773 users and 209, 832 so- 




(d) (e) (f) 

Figure 3: The synthetic data consists of three clusters, (a) Original data with noises, (b) Reconstruction 
data matrix using NMF. (c)Reconstruction data matrix using MMSB (d)Reconstruction data matrix using 
MLFM. (e)Reconstruction data matrix using GLFM. (f) Reconstruct ion data matrix using LFBM. 



Table 1: Average AUG performances on LiveJournal and Coauthor datasets using different models when 
varying the dimension of latent feature factors. Best results are in bold. 
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d=30 


d=40 
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d=40 


d=50 


NMF 


0.7370 


0.7468 


0.7579 


0.7612 


0.6801 


0.6973 


0.7128 


0.7212 


MMSB 


0.6512 


0.6512 


0.6512 


0.6512 


0.6099 


0.6099 


0.6099 


0.6099 


MLFM 


0.7804 


0.8023 


0.8103 


0.8115 


0.7345 


0.7432 


0.7486 


0.7521 


GLFM 


0.8115 


0.8319 


0.8401 


0.8483 


0.7676 


0.7801 


0.7852 


0.7967 


LFBM 


0.8568 


0.8720 


0.8793 


0.8805 


0.8029 


0.8105 


0.8213 


0.8232 



cial links. The second dataset is a coauthorship dataset from 
arXiv archive [4]. The Coauthorship dataset contains binary 
coauthorship between two authors to indicate whether they 
have written one paper together, consisting of 2, 403 authors 
and 21,397 coauthorship. In both datasets, we do not use 
any side information, and only learn the latent feature fac- 
tors and latent block structural factors. For each of these 
datasets, we randomly choose 80% of the relation matrix 
entries for training, then we assume the remaining 20% of 
entries as missing and predict them for testing. The ex- 
periment is repeated 10 times. We evaluate all the models 
by average AUG values averaged over five times. We set the 
number of latent clusters to A; = 20, and set the dimension of 
latent feature factors to d = 20 for the LiveJournal dataset 
and set d = 40 for Coauthor dataset in all the models, and 
the ri is set to 0.25, Au, Ay and Ac are set to 0.5. 

Experimental results are shown in Table 1. We find that 
our proposed LFBM model outperforms all the other mod- 
els in both datasets, which suggests that integrating the 
effects of learning latent feature information and model- 
ing latent block structure leads to better performance com- 
pared to the models that do not consider both these ef- 



fects simultaneously. Taking the LiveJournal dataset as 
the example, comparing to the local structure based MMSB 
model, LFBM gains much higher improvement, which proves 
the excellent predictive performance of latent feature fac- 
torization based methods on link prediction task. While 
with respect to the factorization based models (e.g. NMF, 
MLFM, GLFM) which only learn the latent feature factors 
for learning, LFBM model improves the performance about 
10% ~ 20%, indicating that exploiting the latent structure 
in link prediction task is quite effective to achieve better per- 
formance. Specifically, it is also important to note that our 
proposed LFBM model and other compared models are un- 
der Bernoulli distribution rather than the NMF model under 
Normal distribution, which should remind us that although 
Normal distribution is most popular in network modeling, 
LFBM model under Bernoulli or other possible distributions 
may be more suitable in link prediction applications due to 
the statistical properties in relational data. 

Moreover, Table 1 also shows the performance evolution 
when the dimension d of the latent feature factors varies in 
a wide range. In this range, the higher the dimension of 
the latent feature representation, the better the predictive 




Table 2: NMI performances on Cora dataset using 
different models. Best results are in bold and second 
best in italic. 



Figure 6: The average AUC values for MMSB and 
LFBM models when varying the number k of latent 
clusters on Coauthor dataset with d = 40. 



performance is. In order to achieve a compromise between 
model complexity and performance we then fix d to 20 in 
LiveJournal dataset and fix d to 40 in Coauthor dataset in 
the experiments, since the relation matrix from LiveJour- 
nal dataset is a little denser social network than from the 
Coauthor dataset. 

To consider the effect of latent cluster structure in link 
prediction task, we also vary the number of latent clusters 
k from 10 to 40 in both datasets and compare with the re- 
lated models, i.e. MMSB and LFBM. Since for these real 
world datasets, we do not have the true cluster label for each 
object in the datasets, we cannot evaluate the estimation ac- 
curacy of latent cluster assignments and thus we only look 
at the average AUC performance on the prediction accuracy 
for the missing data. From Figure 5 and Figure 6 we observe 
that LFBM highly outperforms MMSB model with respect 
to all the varying number of latent clusters in both datasets 
respectively, which again proves the benefit of simultane- 
ously incorporating both latent feature factorization and la- 
tent cluster information for constructing predictive models 
for relational data. 

5.3.2 Case Study 2: Cluster Analysis in Citation Net- 
works 

In this study, we consider the paper citation dataset for 
the cluster analysis task. The used experiment data is Cora 
dataset, which contains 2708 papers from the 7 subfields 
(i.e., probabilistic methods, case-based reasoning, genetic 
algorithms, neural networks, reinforcement learning, rule 
learning and theory) and 5429 citations between these pa- 
pers. Each paper has side information that consists of a 
binary word vector indicating the absence/presence of the 
corresponding word from a dictionary. In the cluster detec- 
tion task, for MMSB and LFBM models, we set the number 
of latent clusters to the ground-truth number of class la- 
bels in the data, while for other factorization-style models 
(e.g. NMF, MLFM, GLFM), we choose the dimension of 
the latent feature factors to be the number of class labels 
to simulate the latent cluster assignment for fair compari- 
son Note that the correlations between the dimension of 
latent feature factor and the number of latent cluster will be 
discovered in our future research. 
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^Here we use the different way as in |8 which adopts k- 
means to perform clustering based on the normalized latent 
factors. 
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Figure 7: NMI performance of LFBM model when 
varying the dimension of latent feature factor d. 



In the experiments, we evaluate the latent cluster assign- 
ment performance in terms of NMI score for all the models. 
For that, we set rj = 0.2, and Au, Ay and Ac are set to 0.5, 
the dimension of latent feature factors in LFBM model is 
set to 20. The results are reported in Table 2. From the re- 
sults we can find that LFBM achieves the best performance 
among all the models, which indicates that our model can 
reveal more clearer cluster structure by simultaneous inclu- 
sion of latent feature and latent block factors compared to 
MMSB model and other latent feature factor based mod- 
els. Specifically, the latent cluster structures obtained by 
LFBM model after adjusting for latent feature factors are 
more informative than by MMSB model. Moreover, consid- 
ering the paper citation network containing sparse clusters 
in which papers may be clustered based on their content 
information even without the citations as well as dense clus- 
ters where the citations often exist among papers, GLFM 
and MMSB model can only find the dense ones, while our 
proposed LFBM model is fiexible to reveal the mixed struc- 
tures and obtain much better cluster accuracy. 

Since LFBM model construct both latent feature factors 
and latent structure factors to model the relational data, 
we also want to check the performance evolution when the 
number of latent feature factor d varies. Figure 7 shows the 
NMI performance in the experiments, from which we can 
find that the higher the dimensionality of the latent factor, 
the better the performance of LFBM model is until some 
extent. Due to the compromise between complexity and 
performance in the experiments, we select d = 20 in the 
experiments. 

6. RELATED WORK 

For the problem of modeling relational data, there are var- 
ious approaches, which can be classified into latent feature 
based and latent cluster structure based models. 

Latent Feature Based Models: Latent Feature Mod- 



els are based on matrix or tensor factorization, which learn a 
distributed representation for each object and each relation, 
and then make predictions by taking appropriate inner prod- 
ucts. Their strength lies in the relative ease of their contin- 
uous optimization, and in their excellent predictive perfor- 
mance. The representative model is the Multiplicative La- 
tent Factor Model (MLFM) [H] and the Generalized Latent 
Factor Model (GLFM) [s]. For example, MLFM includes 
both the latent class model and the latent distance model 
as special cases, can somewhat capture both homophily and 
stochastic equivalence in networks. However, this kinds of 
latent factor models are often hard to understand and to 
analyze the learned latent structure. There is a log-linear 
model with latent features for dyadic prediction in relational 
data [Tl] . 

Latent Structure Based Models: The latent structure 
based models provide building latent blocks for complex net- 
works and allow us to understand and predict unknown in- 
teractions between network nodes. For example, stochastic 
blockmodels [Ts] adopt mixture models for relational data. 
In this model, each node is sampled from a cluster based on a 
multinomial distribution. To allow a node belonging to mul- 
tiple groups, Airoldi et al. [2] developed mixed membership 
stochastic blockmodels, which use a latent Dirichlet alloca- 
tion prior to model latent membership variables. Agarwal 
et al. in [l] proposed the Predictive Discrete Latent Fac- 
tor (PDLF) model to predict large scale dyadic response 
variables. The model simultaneously incorporates the effect 
of covariates and estimates local structure that is induced 
by interactions among the dyads through a discrete latent 
factor model. There is also similar work in ^1^. Another re- 
lated research is the relational clustering. Long et al. [9| and 
[10| has proposed a general model for relational clustering 
based on symmetric convex coding. 

An alternative approach to modeling dependencies among 
relational data is the use of relational Gaussian process model 
[17| , which has been successfully applied to a variety of re- 
lational learning problems. Essentially Yu et al. in [Tt] 
use a linear covariance function and Gaussian likelihood in 
their relational GP model. Yan et al. [16] proposed the 
sparse matrix-variate Gaussian process blockmodel to gen- 
eralize the bilinear generative models to handle nonlinear 
network interactions. 

7. CONCLUSION AND FUTURE WORK 

In this paper, we have addressed the problem of mod- 
eling relational data. For that we proposed a novel model 
that simultaneously incorporates the effects of latent feature 
factors and the impacts from the latent block structure in 
the network. The model can collectively capture globally 
predictive intrinsic properties of objects and discover the la- 
tent block structure, which shows the success of the coupled 
benefits of latent feature factorization based approaches and 
latent class based approaches in providing better predictive 
performance and much clearer latent block structure. We 
also employed an efficient optimization transfer algorithm 
to alleviate the model complexity. Extensive experiments 
on the synthetic data and several real world datasets sug- 
gest that our proposed LFBM model outperforms the other 
state of the art approaches in the evaluated tasks for mod- 
eling the relational data. 

There are still directions remaining to be explored. First 
it would be interesting to investigate how to automatically 



choose the number of latent feature factors and the number 
of latent clusters from the data, and reveal the correlations 
between them. Second, the model learning in this paper 
employs the MAP strategy, however it would be promising 
to use a Bayesian approach to involve the marginalization 
of latent factors and model parameters. 
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APPENDIX 

A. APPENDIX I: PROOF OF THEOREM 1 

To prove Theorem 1, we make use of the auxiliary function 
Q{ui,u!f^), and derive the proof as follows: 

Proof. Given the form of Q{ui, u^*'), the auxiliary func- 
tion should satisfy the required conditions in Definition 1. 
For the first condition, it is easy to observe that L(ui) = 
Q{ui,Ui). For the second condition i(uj) > (5(ui, u^*-*), 
since the objective function L{ui) is convex with bounded 
curvature, we first derive the Taylor series of L(ui) with 
respect only to [/ as follows: 

L(ui) = L(uf) + (u. - u«rVL(uf' ) 

+ ^(u.-ufrV^L(u«)(u,-u«) 

where V^L(uf^) can be computed as follows: 

V'L(uW) = -l^{WiMHij){l - a{Hi^))Vl V^.} - A^^I 

Based on the inequality {a{x){l — cr(x) < |}, it is trivial 

to observe that if(uf ') and V^Z/(uf — 7<'(u'*') are positive 
semi-definite matrices. 

Comparing the forms of L(ui) and (5(ui, u'*-*): 

Q(u,,uf)) = L(u«) + (u, - ufy VL(uW) 

+ i(u,-u«mu«)(u,-u«) 

we can observe that L{ui) > Q(ui,up-'). 

Thus Q(ui, u^'') is the auxiliary function for L(ui), which 
is also the lower bound of L{ui). Theorem 1 is proved. □ 



